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Abstract 

Choosing an appropriate set of stimuli is essential in order to char- 
acterize the response of a sensory system to a particular functional 
dimension, such as the eye movement following the motion of a vi- 
sual scene. Here, we describe a framework to generate random texture 
movies with controlled information content, i.e., Motion Clouds. These 
stimuli are defined using a generative model which is based on con- 
trolled experimental parametrization. We show that Motion Clouds 
correspond to dense mixing of localized moving gratings with random 
positions. Their global envelope is similar to natural-like stimulation 
with an approximate full-field translation corresponding to a retinal 
slip. We describe the construction of these stimuli mathematically and 
propose an open-source python-based implementation. Examples of 
the use of this framework are shown. We also propose extensions to 
other modalities such as color vision, touch and audition. 
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1 Introduction 

One of the objectives of system neuroscience is to understand how sensory 
information is encoded and represented in the central nervous system, from 
single neurons to population of cells forming columns, maps and large-scale 
networks. Unveiling how sensory-driven behaviors such as perception or ac- 
tion are elaborated implies to decipher the role of each processing stage, from 
peripheral sensory organs up to associative sensory cortical areas. There is a 
long tradition of probing each of these levels using standardized stimuli of low 
dimension and simple statistics. They are based on a powerful, but stringent 
theoretical approach that considers the visual system as a spatiotemporal 
frequency analyzer [Graham, 1979, Watson et al., 1983]. Accordingly, visual 
neurons have long been tested with drifting gratings in order to characterize 
both their selectivities and some of non-linear properties of their receptive 
fields [DeValois and DeValois, 1988]. A similar approach was applied at 
both mesoscopic and macroscopic scales to define functional properties of 
cortical maps (e.g. [Blasdel and Salama, 1986, Ts'o et al., 1990]) and areas 
(e.g. [Henriksson et al., 2008, Singh et al., 2000]), respectively. 

A more recent trend has been to consider sensory pathways as complex 
dynamical systems. As such, these are able to process high dimensional 
sensory inputs with complex statistics such as encountered during natural 
life. As a consequence, the objective is to understand how the visual brain 
encodes and processes natural visual scenes [Dan et al., 1996]. This has 
led to new theoretical approaches of neuronal information processing [Field, 
1999], as well as to the search for new sets of stimuli for measuring neuronal 
responses to complex sensory inputs (see [Touryan, 2001, Wu et al., 2006]). 
Controversial opinions have been proposed on whether natural scenes and 
movies should be used straightforwardly for visual stimulation as in [Felsen 
and Dan, 2005] or whether one should rather develop new sets of "artificial" 
stimuli. Importantly, the latter approach has the advantage of being rela- 
tively easy to parametrize and to customize at different spatial and temporal 
scales [Rust and Movshon, 2005]. In brief, it has become a critical challenge 
to elaborate new visual stimuli that fulfill these two constraints: being both 
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efficient and relevant to probe high-dimension dynamical systems on the one 
hand, and on the other, being easily tailored so that they can be used to 
conduct quantitative experiments at different scales, from single neuron to 
behavior. 

Here, our aim is to provide such a set of stimuli cast into a well-defined 
mathematical framework. We decided to focus on motion detection, as a 
good illustration for the search for optimal high- dimension stimuli. Visual 
motion processing is critically involved in several essential aspects of low- 
and middle vision such as scene segmentation, feature integration and ob- 
ject recognition (see [Braddick, 1993, Bradley and Goyal, 2008, Burr and 
Thompson, 2011] for reviews). It also provides essential aspect of visual in- 
formation for motor systems such as speed and direction of moving objects, 
as well as about self-motion. Lastly, it is one of the few systems for which an 
integrated approach from single neuronal activity to complex behaviors can 
be achieved using nearly identical experimental conditions, in order to elu- 
cidate the neural bases of perceptual decisions [Newsome, 1997] and motor 
responses (see [Masson and Ilg, 2010] for a collection of examples). 

However, motion perception is highly dynamical and the classical tool- 
box of standard motion stimuli (such as dots, bars, gratings or plaids) is 
now largely outdated and insufficient to understand how the primate brain 
achieves visual motion processing with both high efficiency and short com- 
puting time. To be optimal, a new set of stimuli should be rooted in the- 
oretical assumptions about how motion information is processed [Watson 
and Turano, 1995]. A large bulk of experimental and theoretical evidences 
support the view that local motion information is extracted through a set 
of spatiotemporal frequency analyzers, whose outputs are then integrated 
to yield motion direction and amplitude [Adelson and Bergen, 1985, Si- 
moncelli and Heeger, 1998]. However, we still lack a deep understanding 
of several linear (L) and nonlinear (NL) operations needed to extract the 
global motion from the local luminance changes (see [Derrington et al., 2004] 
for a recent review). For instance, it remains unclear how MT neurons can 
encode speed and direction independently of the local spatiotemporal fre- 
quency or orientation content of the image (see [Bradley and Goyal, 2008] 
for a recent review). It is also hard to predict MT neurons responses to 
dense noise patterns or natural scenes from their spatiotemporal frequency 
selectivity as explored with low-dimension stimuli [Nishimoto and Gallant, 
2011, Priebe et al., 2006]. Lastly, neuronal responses to natural movies are 
more reliable and sparse than when driven by low dimensional stimuli such 
as drifting gratings [Vinje and Gallant, 2000]. 

To overcome thee limits, several recent studies have proposed that lin- 
early combining several frequency channels can partly account for pattern 
direction and speed selectivity [Nishimoto and Gallant, 2011, Rust et al., 
2006, 2005]. Still, such multistage L-NL models [Heeger et al., 1996, Si- 
moncelli and Heeger, 1998] fail to account most of the response properties 
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seen with natural scenes (see [Carandini et al., 2005] for a review). One key 
issue is to understand how motion information gathered at different scales 
is normalized and weighted before integration as in the divisive normaliza- 
tion version of the L-NL model of motion detection. Natural-like stimuli are 
good probes to further explore the performance of these models [Schwartz 
and Simoncelli, 2001]. However, "raw" natural scenes have the major draw- 
back that information content is poorly controlled: Their dimensionality is 
extremely high and the inter-stimulus variability in the information content 
with respect to sensory parameters is large [Rust and Movshon, 2005]. Pop- 
ular alternatives to natural scenes are dense and sparse noise. However, 
those are often irrelevant to the sensory system and most often fail to drive 
strong neuronal responses. Here, we explore a new approach for the charac- 
terization of the first-order motion system. Our stimuli are equivalent to a 
sub-class of random phase textures (RPTs) [Galerne et al., 2010], which are 
increasingly attracting interest in exploring neural mechanisms of texture 
perception (e.g. [Solomon et al., 2010]) 

The paper is organized as follows. In the Method section, we first re- 
call the main properties of RPTs as originally defined in computer vision 
for texture analysis. Next, we define their dynamical version, called there- 
after Motion Clouds (MCs), and provide their complete mathematical for- 
mulation. We briefly describe the architecture of our implementation, all 
technical details being available as supplementary material, including the 
source code. In the Results section, we illustrate the practical use of MCs 
for studying several long-lasting problems of visual motion processing such 
as 2D motion integration, motion segmentation and transparency. For each, 
we will compare the usefulness of Motion Clouds relative to existing low- 
dimension stimuli. Finally, we discuss how this approach can be generalized 
to different aspects of visual system identification. 

2 Methods 

2.1 Random phase textures and natural retinal motion 

First, Random Phase Textures (RPTs) are defined as generic random mo- 
tion textures that are optimal for probing luminance-based visual processing. 
Most of the information present in a given dynamical image can be divided 
into its geometry (that is the outline of the objects it represents) and its 
distribution of luminance in space and time [Jasinschi et al., 1992, Neri 
et al., 1998, Perrone and Thiele, 2001, 2002]. In the spatiotemporal Fourier 
space this is well separated between the phase and the absolute amplitude 
spectra, respectively [Oppenheim and Lim, 1981]. This can be easily seen 
by gradually perturbing the phase spectrum of a natural scene: while form 
is progressively lost, its global motion information remains essentially un- 
changed (see Figure 1). This invariance with respect to phase shuffling in the 
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Figure 1: (A) Top: a natural movie with the main motion component 
consisting of a horizontal, rightward full-field translation. Such a movie 
would be produced by an eye movement with constant mean velocity to 
the left (negative x direction), plus some residual, centered jitter noise in 
the motion-compensated natural scene. We represent the movie as a cube, 
whose (x,y,t = 0) face corresponds to the first frame, the (x, y = Q,t) face 
shows the rightward translation motion as diagonal stripes. As a result of 
the horizontal motion direction, the (x = 54, y, t) face is a reflected image 
of the (x, y, t = 0) face, contracted or dilated depending on the amplitude 
of motion. The bottom panel shows the corresponding Fourier energy spec- 
trum, as well as its projections onto three orthogonal planes. For any given 
point in frequency space, the energy value with respect to the maximum 
is coded by 6 discrete color iso-surfaces (i.e.: 90%, 75%, 50%, 25%, 11% 
and 6% of peak. The amplitude of the Fourier energy spectrum has been 
normalized to 1 in all panels and the same conventions used here apply 
to all following figures. (B) to (C): The image is progressively morphed (A 
through B to C) into a Random Phase Texture by perturbing independently 
the phase of each Fourier component, {upper row): Form is gradually lost 
in this process, whereas (lower row): most motion energy information is 
preserved, as it is concentrated around the same speed plane in all three 
cases (the spectral envelopes are nearly identical). 
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Figure 2: From an impulse to a Motion Cloud. (A): The movie correspond- 
ing to a typical "edge", i.e., a moving Gabor patch that corresponds to a 
localized grating. The Gabor patch being relatively small, for clarity, we 
zoomed 8 times into the non-zeros values of the image. (B): By densely 
mixing multiple copies of the kernel shown in (A) at random positions, we 
obtain a Random Phase Texture (RPT), see Supplemental Movie 1. (C): We 
show here the envelope of the Fourier transform of kernel K: inversely, K 
is the impulse response in image space of the filter defined by this envelope. 
Due to the linearity of the Fourier transform, apart from a multiplicative 
constant that vanishes by normalizing the energy of the RPT to 1, the spec- 
tral envelope of the RPT in (B) is the same as the one of the kernel K shown 
in (A): Ss = F{K). Note that, the spectral energy envelope of a "classical" 
grating would result in a pair of Dirac delta functions centered on the peak 
of the patches in (C) (the orange "hot-spots"). Motion Clouds are defined 
as the subset of such RPTs whose main motion component is a full-field 
translations and thus characterized by spectral envelopes concentrated on a 
plane. 

Fourier domain is generally considered to be characteristic of the first-order 
motion stage [Derrington et al., 2004, Lu and Sperling, 2001]. 

We next formally define a linear generative model for the synthesis of 
such natural- like moving textures. Most generally, we can describe lumi- 
nance at position (x, y) and time t as the scalar I(x, y, t) that is the sum of 
the contribution of a set of basis functions: 



The function G defines the family of basis functions where each basis func- 
tion is defined by parameters /3f~- Scalars give the relative amplitude for 
each basis function and therefore will change for each individual image /, 
while the parameters (3k are fixed for a set of stimuli. The advantage of 
this generative model is to separate the temporal scale of coding a specific 
moving stimulus (represented by the scalars afc) from the temporal scale of 
a whole set of stimuli (as represented by the Efficient coding strategies 
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use such generative models by optimizing scalars knowing a fixed set of 
basis functions (3k- Note that finding the optimal set knowing this linear 
generative model and the image I is in general a non-linear problem (it is 
the coding problem). When the set given by (3k is large, this problem be- 
comes difficult. In that context, divisive normalization gives a fair account 
for this problem using for its solution second-order correlations across basis 
functions [Schwartz and Simoncelli, 2001]. On a slower temporal scale, such 
model is used in neural modeling for studying the emergence of receptive 
fields by optimizing /3k, such as by using a Bayesian framework [Perrinet, 
2010]. 

In Fourier space, by linearity: 

HWx, fy, ft) = £>* • HG){f x , fy, ft, (3 k ) (2) 

k 

where J- is the Fourier transform. Here, we will use a fixed set of spatiotem- 
poral Gabor kernels to implement localized, moving grating-like textures as 
they are known to efficiently code for natural images. For this particular set, 
parameters are defined as the peak's spatiotemporal position {xk, Vk, tk}, ve- 
locity (direction and speed), orientation and scale. We may write the trans- 
lation of each component using the shift operator in the Fourier domain: 

nWxJyJt) = E a * • £ P k (f*,fyJt) ■ e-^-^+^+A-'*) (3) 

k 

where (3k denotes the parameters without positions {x^, y^, In general, 
parameters /3k have some statistical regularities in Fourier space: For in- 
stance, velocity, orientation and scale parameters are correlated in space 
and time [Lagae et al., 2009, Lewis, 1984]. This defines an average spectral 
density envelope that we denote as £ § and which is characteristic of the 
particular class of natural images that is coded [Torralba and Oliva, 2003] . 

We use this generative model to define RPTs and their Motion Clouds 
derivative that can be seen as first-order motion textures. If we shift ran- 
domly and independently the central position of edges (see Figure 1) and 
that this perturbation is stochastically independent from the distribution of 
the others parameters, one can describe the image by the following mean- 
field equation on its Fourier transform: 

Hl)Ux, fy, ft) = £p(f X , fy, ft) e -^/--**+/»™+/*-**) ( 4 ) 

k 

By consequence, the envelope is modulated by a stochastic spectrum that 
is defined at any point in Fourier space as the sum of random independent 
variables with the same distribution and variance. By virtue of the central 
limit theorem, we may define the set of stimuli / as the random sequences 
generated by 1) an average envelope £g, 2) a normally distributed, iid am- 
plitude spectrum A, 3) a uniformly distributed phase spectrum 3> in [0, 2n), 
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that is to Random Phase Textures (RPT) [Galerne et al., 2010] trivially 
extended to the spatiotemporal domain: 

=£ r A-e^ (5) 

As noted by [Galerne et al., 2010], the main visual ingredients of RPTs 
are the envelope spectrum and the random phase spectrum, while A has 
little perceptual effect. Indeed, removing the random amplitude spectrum, 
we still have a random fluctuation of the sign of each Fourier coefficient. 
From the central limit theorem, under the condition that the number of 
mixed components is large enough, the coefficient spectrum resulting from 
the mixing described by 

m = % ■ <** (6) 

will still be given by a normal distribution and correspond to a correct 
generative model for RPTs as in Equation 5. Stimuli corresponding to such 
equations correspond to band-limited filtering of white-noise, that is, to a 
white noise (in space and time) linearly filtered by the kernel K, where 
K = F~ l {£o) corresponds to the average impulse response of the texture 
(see Figure 2). 

This class of random, textured, dynamical stimuli have several advan- 
tages over classical narrow bandwidth, low entropy stimuli, such as gratings 
or combinations of gratings. First, by varying the weight of each Fourier 
coefficient, we can vary its content and probe different types of motion in- 
tegration models. Second, we can generate several different series of stimuli 
with different randomization seeds, while keeping all other parameters con- 
stant. Third, we can play with the bandwidth along each dimension to 
titrate the role of distributions of frequencies onto neuronal or behavioral 
responses. Fourth, we can reproduce the statistics of natural images by con- 
trolling the global envelope in Fourier space. Fifth, stochastic properties are 
generated only by varying the phase spectrum, without the need for adding 
noise component to the motion stimulus or controlling lifetime of individ- 
ual features. Below we shall discuss several examples of the experimental 
usability of such stimuli. Stimuli similar to RPT have been already used. 
This was first formalized for the generation of natural-like static textures 
in computer vision [Lewis, 1984] such as procedural or Perlin textures and 
is still largely used [Lagae et al., 2009]. Mathematically, the resulting pat- 
terns are related to the morphogenesis studies pioneered by [Turing, 1952]. 
Such static textures were used in psychophysics [Essock et al., 2009], in neu- 
rophysiology, for instance to study sensitivity of VI neurons to dynamical 
expansion [Wang and Yao, 2011] or nonlinear properties of non-classical re- 
ceptive fields of primate MT neurons [Solomon et al., 2010]. It is worth 
noting that a similar stimulus design was proposed for investigating another 
sensory system, i.e., audition [Klein et al., 2000, Rieke et al., 1995]. 
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2.2 Motion clouds as one particular type of random phase 
textures 

Defining optimal motion stimuli in order to probe the first-order, luminance- 
based motion system is a nontrivial problem. A straightforward approach is 
to generate static textures and to generate sequences as an exact, full- field 
translation of this static texture [Drewes et al., 2008]. However, this ap- 
proach is not generic enough. In particular, it lacks the possibility to vary 
the distribution of speeds being present in a given movie, a parameter that 
might be crucial to study precision and robustness of motion processing for 
perception or eye movements. Motion Clouds can be defined as the subset 
of RPTs that results from a generative model inspired by a rigid transla- 
tion at central velocity of a large texture filling the whole visual field. This 
generative model will be specified by a central velocity V for the full-field 
translation, plus random independent perturbations of velocities around the 
central velocity, given by a bandwidth By. As a consequence, the spectral 
distribution of energy of such a sequence is centered on and squeezed onto 
a plane defined by the normal vector V. The L-NL models of direction 
selectivity match the spatiotemporal properties of VI or MT neuron recep- 
tive field with this plane. Phase information is concealed using the squared 
sum from the activity of receptive fields of odd and even phase [Adelson 
and Bergen, 1985]. By definition, Motion Clouds using a similar envelope 
as given by the spatiotemporal filtering properties of VI or MT neurons 
are thus equivalently defined as the set of stimuli that are optimally de- 
tected by these energy detectors [Nishimoto and Gallant, 2011]. Moreover, 
they are also optimal for motion coding in the information-theoretic sense, 
since they maximize entropy [Field, 1994] compared to the presentation of 
a simple kernel K as in [Watson and Turano, 1995]. Similar random tex- 
tures as Motion Clouds have been generated by displaying a rectangular 
grid of Gabor patches with random orientations and directions [Scarfe and 
Johnston, 2010]. However, such a regular grid introduces some geometrical 
information that may interfere with the processing of motion, as opposed 
to RPTs. Our Motion Clouds are more similar to the texture stimuli in- 
troduced by [Schrater et al., 2000] or to the dynamical displays designed 
by [Tsuchiya and Braun, 2007]. Below we propose one well-defined math- 
ematical formalization for our Motion Clouds before presenting a solution 
for their implementation in psychophysical toolboxes. 

3 Mathematical definition of Motion Clouds 

We define Motion Clouds (MCs) as RPTs that are characterized by several 
key features. First, first-order motion information is independent on changes 
in the phase of the Fourier coefficients of image sequences since it is contained 
in the amplitude of the spectral coefficients (Eq. 5). 
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I(x,y,t) = F- 1 {SpiUJyJt) • e^)} (7) 

Second, in Fourier space, full- field, constant, translational motions corre- 
spond to an envelope £ g whose distribution is concentrated on a speed plane 
(a plane in Fourier space that contains the origin). Third, the distribution 
of the spectral envelope £ a is defined as a Gaussian. This is explained by the 
fact that Gabor filters have a Gaussian envelope and thus an optimal spread 
in Fourier space [Marcelja, 1980]. As such they are well known models of 
simple cells in the primary visual cortex [Daugman, 1980] that can describe 
the most salient properties of receptive fields and their tuning for spatial 
localization, orientation and spatial frequency selectivity (e.g. [Jones and 
Palmer, 1987]). Moreover, Lee [1996] derived the conditions under which a 
set of 2D Gabor wavelets are a suitable image basis for a complete repre- 
sentation of any image. This was further extended to the case of sequences 
with a known motion [Watson and Turano, 1995] and therefore constitutes 
an accurate set for studying first-order motion. In summary, envelopes of 
MCs are essentially Gaussian distributions that are concentrated close to a 
speed plane (see Figure 2). Equivalently, these characteristics define MCs as 
dense random mixing of spatiotemporal Gabor filters with similar speeds. 

The implementation presented herein is based on a simplified parametriza- 
tion of the envelope of the amplitude spectrum. Given that speed, radial 
frequency and orientation spread are independent, we can parametrize dif- 
ferent types of MCs based on a factorization of each component. 

£3 = V{y*,V v ,B v ) x G{h,B } ) x £>{6,B e ) x C( Q ) (8) 

where all envelope parameters are given in their sub-label and envelopes 
correspond respectively to the speed plane, the frequency and the orientation 
tuning along this plane: 

1. For the speed envelope V, two parameters define motion V = (V x , V y ) 
(and thus the speed plane) while one parameter defines the bandwidth 
By of this plane as we jitter the mean motion V. Varying these pa- 
rameters allows to study the response of motion detectors to different 
speeds and amounts of velocity noise. 

2. Projected onto the speed plane, we can define (i) the radial frequency 
envelope Q with two parameters that set its mean value fo and band- 
width Bj. Also (ii) the orientation envelope O, is defined by two 
parameters: mean orientation 6 and bandwidth Bq. In both cases, the 
two parameters can be thought as defining the nominal value and the 
uncertainty of each respective component of motion information. 

3. An additional envelope C is parametrized by a. It tunes the overall 
shape of the envelope similarly to what is observed in natural images. 
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Note that one can modify the parameters of each envelope independently 
and moreover, by the commutativity of the product operation, the order of 
the envelopes is arbitrary. It shall further be noticed that the actual values 
of each of these parameters can be set based on known properties of the 
biological system to be investigated, for each level of observation. We will 
now detail each of them. 

3.1 Speed Envelope 

The first axis of a Motion Cloud is its speed component. Let us first recall 
that the Fourier transform of a static image with a global translation motion 
is the Fourier transform of the static image (that lies in the ft = plane) 
tilted on a plane perpendicular to V = (V x , V y ) defined as: 

V X f X + Vyfy+f t =0 (9) 

The orientation and tilt of the plane are determined by the direction 

and speed of motion, respectively. For larger V = \\V\\ = \jv x + Vy > the 

tilt becomes greater. To model speed variability (jitters) within a motion 
cloud, we shall assume that motion varies slightly in both speed axes (i.e., 
direction and amplitude). Such envelope is given for instance by: 

V iVx y y , Bv) (f x J y Jt) = exp (-\ ( fx ' Vx ^ V f ^ y + ft ) 2 ) (10) 



where fr = yf x +fy + ff is the radial frequency. 
3.2 Radial frequency envelope 

The second characteristic of a Motion Cloud is its radial frequency envelope. 
This is defined as the one-dimensional distribution of radial frequency using 
spherical coordinates in the Fourier domain. Indeed, by spherical symmetry, 
this radial frequency envelope is then independent to motion and orientation 
tuning. An intuitive description of this envelope is a Gaussian distribution 
along this radial dimension, as it is often encountered to describe the fre- 
quency component of Gabor filters. An inconvenient of Gabor functions is 
the fact that their sum is not perfectly null. This shows up in Fourier space 
as a non-zero value at the origin. To overcome this issue we use the log- 
Gabor filters [Fischer et al., 2007]. A second advantage of using log-Gabor 
filters is that they better encode natural images [Field, 1987]. We thus build 
a spatial frequency band Gaussian filter that depends on the logarithm of 
the spatial radial frequency. We define fo as the mean radial frequency and 
Bf as the filter's bandwidth. 
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3.3 Orientation Envelope 

The third property of a Motion Cloud is its orientation envelope. Oriented 
structures in space-time yield oriented structures in the Fourier domain. 
Thus, the orientation component of the spectrum is given by the function 
C(/aj) f y , ft)- It is defined by a density function located at a mean orien- 
tation 9 and whose spread is modeled using a Von-Mises distribution with 
parameter Bg that represents its bandwidth centered on the symmetric with 
respect to the origin: 

„/, ,s /cos(0, -0)\ (caa(0f - 6 - tt)\ . . 
G(f x , f y ) = exp ( { ^ > ) + exp I K f — (12) 

where Of = arctan(/ x , f y ) is the angle in the Fourier domain. Note again 
that this envelope is independent upon both speed and radial tuning. We 
define its bandwidth using the standard deviation Bg. If Bg has a small 
value, a highly coherent orientation pattern is generated. 



3.4 Spectral color 

An important property of a Motion Clouds is their global statistics. There- 
fore, the average shape of its power spectrum must be controlled. It has 
been shown that the average power spectrum of natural scene follows a 
power law [Field, 1987]: 

C a {f x , fy, ft) = -j^ (13) 
JR 

where a is usually set within the range < a < 2. By analogy with the color 
terminology used to characterize noise patterns, we call this function color 
envelope. The simplest stimulus that can be built with our model is filtered 
spatial noise as a function of the power (exponent) factor a. In our model, 
we assume that the spectrum shape is independent of orientation and varies 
solely as a function of a radial frequency (/#), defined as in [Schrater et al., 
2000] by: 

fR = ]]f' + fv + § ( 14 ) 

The factor ft is a normalization factor and is associated to a normalized 
stimulus velocity. 
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The color envelope weights the different frequency channels according 
to the statistics of natural images and therefore is optimal regarding the 
sensitivity of the primate visual system to the different spatiotemporal fre- 
quencies [Atick, 1992]. In the examples given below, we choose a = 1, 
corresponding to a a pink noise distribution. Note that this particular value 
allows for the marginal distribution integrated over all orientations to co- 
incide with the speed and frequency envelope. Qualitatively, this global 
envelope does not change neither motion nor texture appearance of a Mo- 
tion Cloud since it has no preferred speed, frequency or orientation. This 
is true in particular for Motion Clouds with a relatively narrow envelope in 
Fourier space. When using larger bandwidth values of the radial frequency 
distribution Bf, the shape of the global envelope becomes more important. 

3.5 Implementation 

Since our objective is to provide a new set of stimuli for conducting neuro- 
physiological or psychophysical experiments, we must propose a framework 
for generating and displaying Motion Clouds under well controlled param- 
eter settings. Using standard computer libraries, the theoretical framework 
described above can be implemented while taking into account technical 
constraints such as discretization and videographic displays. In the sup- 
plementary material, we provide with the source code used to generate our 
calibrated motion clouds using Python libraries. 

4 Results 

In order to illustrate how Motion Clouds can be used to investigate different 
aspects of motion processing, we now describe some of their applications. 
We emphasize how classical stimuli, such as gratings or plaid patterns, can 
be conveniently represented as Motion Clouds. This last aspect is important: 
Motion Clouds can be seen as a single class of motion stimuli encompassing 
both low-dimension and complex dynamical stimuli. It becomes then pos- 
sible to parametrically investigate the effects of spatiotemporal frequency 
content upon different stages of motion processing. It shall be noticed that 
all the following examples are chosen such as to fit the characteristics of 
visual motion systems; yet, the same logic applies to other aspects of visual 
processing, such as texture or shape perception. 

4.1 Motion Clouds equivalents of classical stimuli 

Sinusoidal luminance gratings are defined by a small set of parameters (ori- 
entation, direction, frequency). This translates naturally into a set of Motion 
Clouds with the parameters that we defined: speed, orientation, frequency. 
In addition, we now have the choice of 3 extra parameters By, Bg, Bf 
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that tune the bandwidth along each of these components, respectively (see 
Figure 3-left). It thus becomes possible to investigate spatial or temporal 
frequency, orientation or direction selectivity, as well as the role of their 
respective tuning bandwidths. 

With drifting gratings, perceived motion direction is necessarily defined 
perpendicular to its orientation. This is related to the aperture problem: 
translation of an ID elongated edge is ambiguous and its visual motion 
is compatible with an infinite number of velocity vectors [Movshon et al., 
1985]. A novel formulation of this problem can be designed by creating a 
Motion Cloud whose direction is not perpendicular to the main orientation 
and whose orientation bandwidth is very narrow. Indeed, a classical motion 
detector would then be incapable of determining non- ambiguously the speed 
plane that corresponds to such an envelope (see Figure 3-middle). 

Motion Clouds also encompass textures similar to Random-Dot Kine- 
tograms (RDKs). Usually, RDKs consist in a set of small dots drifting in 
a given direction and speed, each dot having a limited life time. This is 
similar to our original definition of Motion Clouds. Such pattern is defined 
in Equation 1 with a kernel that would correspond to a Dirac delta function 
in space, a square ON and OFF function in time and a sparse set of coeffi- 
cients <2j. Note that this kernel would correspond to a flat envelope on the 
speed plane with a bandwidth proportional to the inverse of the life-time of 
dots. This is therefore controlled in Motion Clouds by the parameter By 
and indeed, we observe that shorter values induced "features" which last 
longer. We stress, however, that Motion Clouds are necessarily equivalent 
to dense, not sparse, noise patterns. 

Moreover, each Motion Cloud is generated by a fully known, computer- 
generated noise. It is therefore possible to regenerate exactly the same stim- 
ulus by using the same seed in the random number generator. This property 
allows to investigate inter-trial variability and thus the relative importance, 
for the system at hand, of external noise (measurement noise) and internal 
noise (uncertainty due to ambiguities and mixtures in the signal representa- 
tion). This approach corresponds to the use of frozen noise stimuli [Mainen 
and Sejnowski, 1995], that is, with a set of inputs for the visual system 
that are randomly generated but can be presented many times in a strictly 
identical manner. 

4.2 Comparing broad band and narrow band motion stimuli 

Varying the spatiotemporal frequency distribution from a grating-like stim- 
ulus to complex random phase textures should be a powerful method for 
investigating neuronal selectivity and cortical maps of extra-striate areas. 
Motion Clouds (and other types of RPT patterns) shall be able to drive 
cortical neurons known for receiving converging inputs from several spa- 
tiotemporal frequency channels [Rust and Movshon, 2005]. We have con- 
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sidered the idea of generating stimuli to explore the effects of varying a 
single bandwidth parameter: Bf, while setting By and Bg to some fixed 
values (with relatively low values to get some precision along these com- 
ponents). We use the name BroadBand stimuli (BB) for the MCs with a 
large Bf, whereas NarrowBand stimuli (NB) are MCs characterized by small 
Bf values. As illustrated in Figure 3-middle, BB and NB clouds are both 
symmetric, airfoil-shaped volumes. However, the broadband envelope con- 
tains more frequency information than the narrow band one. Therefore it is 
thought to better represent natural images. 

Recently, we have used such Motion Cloud stimuli to investigate how 
the visual system integrates different spatial frequency information lev- 
els, by varying Bf across a large range of spatial frequencies [Simoncini 
et al., 2011]. The stimuli were displayed using Psychotoolbox v3 [Brainard, 
1997, Pelli, 1997] for MATLAB (http://psychtoolbox.org) on a CRT monitor 
(1280 x 1024@100Hz). They covered 47 degrees of visual angle at a viewing 
distance of 57 cm. We have used these stimuli to understand how two differ- 
ent visual behaviors, perceptual speed discrimination and reflexive tracking, 
would take advantage of presenting a single speed at different spatial scales. 
We found that the visual system pools motion information adaptively, as a 
function of constraints raised by different tasks. Motion Clouds were found 
to be useful to resolve problems associated with the integration of multi- 
ple spatial frequencies as they allow a precise control all variables related 
to speed and frequency content. In particular, previous studies have failed 
to understand how speed information is reconstructed across different spa- 
tial frequencies because the mixing of two, or more, gratings poses several 
perceptual problems. For instance, depending on the phase relationship be- 
tween spatial frequency components, different interference patterns would 
appear, generating second-order motion in same or opposite motion direc- 
tion [Smith and Edgar, 1990]. Second, mixing sparse RDKs moving at the 
same speed but band-pass filtered at different spatial frequency results in 
complex patterns that can be perceived as being either coherent or transpar- 
ent [Watson and Eckert, 1994]. The same difficulties have been encountered 
by neurophysiological studies trying to understand the origin of speed selec- 
tivity in VI complex cells [Priebe et al., 2006] or MT neurons [Priebe et al., 
2003]. 

4.3 Clouds with competing motions 

Low and mid-level visual integration and segmentation mechanisms have 
been extensively investigated with either combinations of gratings (i.e. plaid 
patterns) or random dot patterns with different directions, speed and/or 
spatiotemporal components. Such plaid stimuli have been extensively stud- 
ied and constitute an important pillar in motion detection theories, such as 
the separation between component and pattern cells in area MT (see [Born 
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ABC 




Figure 3: Equivalent MC representations of some classical stimuli. (A, top): 
a narrow-orientation-bandwidth Motion Cloud produced only with vertically 
oriented kernels and a horizontal mean motion to the right (Supplemen- 
tal Movie 1). (Bottom): The spectral envelopes concentrated on a pair of 
patches centered on a constant speed surface. Note that this "speed plane" 
is thin (as seen by the projection onto the (f x ,ft) face), yet it has a finite 
thickness, resulting in small, local, jittering motion components. (B) a Mo- 
tion Cloud illustrating the aperture problem. (Top): The stimulus, having 
oblique preferred orientation (9 = | and narrow bandwidth Bg = it/ '36) is 
moving horizontally and rightwards. However, the perceived speed direction 
in such a case is biased towards the oblique downwards, i.e., orthogonal to 
the orientation, consistently with the fact that the best speed plane is am- 
biguous to detect (Supplemental Movie 2). (C): a low-coherence random-dot 
kinematogram-like Motion Cloud: its orientation and speed bandwidths, Bg 
and By respectively, are large, yielding a low-coherence stimulus in which 
no edges can be identified (Supplemental Movie 3). 
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ABC 




Figure 4: Broadband vs. narrowband stimuli. From (A) through (B) to 
(C) the frequency bandwidth Bf increases, while all other parameters (such 
as /o) are kept constant. The Motion Cloud with the broadest bandwidth 
is thought to best represent natural stimuli, since, as those, it contains 
many frequency components. (A) Bf = 0.05 (Supplemental Movie 4), (B) 
Bf = 0.15 (Supplemental Movie 5) and (C) Bf = 0.4 (Supplemental Movie 
6). 
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and Bradley, 2005, Burr and Thompson, 2011, Movshon et al., 1985] for re- 
views). However, there have been a long standing controversy about which 
information can be used in plaid patterns, such as component gratings, their 
product or 2D features called blobs that are generated at the intersection 
between component gratings [Derrington et al., 2004]. It is also unclear how 
different direction and spatial frequency channels are mixed to create pat- 
tern direction selectivity [Rust et al., 2006]. As explained above, Motion 
Clouds stimuli are by definition less susceptible to create interference pat- 
terns (or Moire patterns) when mixed together. This is a striking difference 
with respect to classical low-entropy stimuli, such as gratings. Being able 
to mix together two textures with the same motion but different character- 
istic spatial frequencies is also critical to further study motion integration 
(e.g. single neuron selectivity: [Rust and Movshon, 2005]; ocular tracking 
behavior: [Masson and Castet, 2002]; motion perception [Smith and Edgar, 
1990]). By contrast, it must be also possible to mix two textures with differ- 
ent motions to study the competition between integration and segmentation, 
leading to different percepts such as coherent or transparent motions. 

Using Motion Clouds, there is a further number of combinations that 
may be of interest for studying motion detection. We illustrate several pos- 
sibilities in Figure 5. The left panel shows a standard Motion Cloud with 
added explicit noise, corresponding to an envelope broadly centered around 
V = 0. The middle panel illustrates the plaid-equivalent Motion Cloud 
obtained by adding two Motion Clouds of same velocity but different orien- 
tations, similarly to a plaid stimulus. In the right panel, the two components 
have different velocities (here opposite ones) while all other parameters are 
identical. With standard gratings, such two gratings would interfere and 
create a counter-phase, flickering stimulus. With Motion Clouds, there is 
no such interference and the resulting stimulus has all desired energy dis- 
tributed on both speed planes. By varying the relative direction of two, or 
more, components, it becomes possible to produce several transparent pat- 
terns and therefore to overcome a limit of classical motion stimuli such as 
gratings. 

5 Discussion 

In this article we described the mathematical framework and provide the 
computer implementation of a set of complex stimuli that we call Motion 
Clouds. Those are an instantiation of a more generic class of stimuli called 
Random Phase Textures. These stimuli, presented herein in the context of 
visual motion perception, represent an attempt to fill the gap between simple 
stimuli (such as spots of light or sinusoidal gratings), stimulus ensembles 
consisting of simple stimuli (for instance, white noise patterns) and natural 
stimuli [Felsen and Dan, 2005, Rust and Movshon, 2005]. Similar approaches 
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ABC 




Figure 5: Competing Motion Clouds. (A): A narrow-orientation-bandwidth 
Motion Cloud with explicit noise. A red noise envelope was added to the 
global envelop of a Motion Cloud with a bandwidth in the orientation do- 
main (Supplemental Movie 7). (B): Two Motion Clouds with same motion 
but different preferred orientation were added together, yielding a plaid-like 
Motion Cloud texture (Supplemental Movie 8). (C): Two Motion Clouds 
with opposite velocity directions were added, yielding a texture similar to 
a "counter-phase" grating (Supplemental Movie 9). Note that the crossed 
shape in the f x — ft plane is a signature of the opposite velocity directions, 
while two gratings with the same spatial frequency and in opposite direc- 
tions would generate a flickering stimulus with energy concentrated on the 
ft plane. 
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have been used before in the case of motion detection [Schrater et al., 2000] 
but stimuli have been described in a somewhat incomplete and non accessible 
way. Here, our goal was to provide a complete and rigorous mathematical 
description of those stimuli, as well as tools for generating them. We have 
also given a few examples of different subset of Motion Clouds that could 
be used for probing detection, integration and segmentation stages at both 
psychophysical and neurophysiological levels. To conclude, we indicate a 
few future extensions and possible uses. 

5.1 Embedding spectral properties of natural images 

Both sensory and motor functions are natural tasks and therefore it is es- 
sential to understand how they deal with natural stimuli. Following the 
core principles of Natural Systems Analysis [Geisler and Ringach, 2009], 
we think it is possible to extend our model of natural stimulation to carry 
out different and more complex experiments in the visual system. Semi- 
nal work from [Zeki, 1983] showed that there exists a selectivity for color 
in higher visual areas such as V4 of the macaque monkey. Moreover, the 
work by [Conway et al., 2007] shows that in the extra-striate cortex (V3, 
V4, Inferior Temporal Cortex) color is processed in terms of the full range 
of hues found in color space. The spatial structure is represented by 'globs' 
which are clustered by color preference, and organized as color columns. It 
is therefore important to develop an extension of MCs to be able to probe 
color vision. A first approach would consist in creating a simple colored MC 
using an RGB scheme [Galerne et al., 2010]. In this case we should add 
the same uniform random phase to each color channel. More realistically, 
a short medium long-cone (SML) scheme will have to be used, taking into 
account the cone fundamentals. Such color texture stimuli would permit 
to create a wide variety of new psychophysics experiments related to color 
perception. 

5.2 Exploiting phase parameters: towards a systematic ex- 
ploration of the role of geometry 

The amplitude spectra of natural images are characterized by their i/f shape; 
in consequence, the global power spectrum cannot provide much information 
on any natural image that can be used, for instance, for fine pattern recog- 
nition or classification (e.g. [Victor and Conte, 1996], see also [Oliva and 
Torralba, 2001, Torralba and Oliva, 2003]). Information contained within 
the phase spectrum is therefore the key to identifying the contents in the im- 
ages, i.e, how shape is coded in natural images. This implies that the visual 
system must be sensitive to the phase structure of artificial stimuli or natu- 
ral images, at least at some spatial scales [Hansen and Hess, 2006, Phillips 
and Todd, 2010]. This could be related to the rich representation of phase 
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provided by the receptive field structure of visual neurons, from primary 
visual cortex up to extra-striate areas and further. We believe that Motion 
Clouds -and to a larger extent Random Phase Textures — are powerful tools 
to probe the properties of phase-sensitive mechanisms in neural populations. 
In the cases presented above, patterns have parametrized random phases: 
phase values are drawn from a uniform probability distribution. However, 
one can evidently draw these phases according to some structure known a 
priori, for instance, by correlating the phase of edges with similar orienta- 
tions. This would progressively introduce collinearities in the set of stimuli, 
as needed to trigger short-range properties within the so-called association 
fields [Hess et al., 2003]. By manipulating these parameters, we shall be able 
to control for the detailed information content in the different axes of the 
corresponding associative field, for instance the role of collinearity versus 
cocircularity [Perrinet et al., 2011]. 

5.3 Extension to increased complexity 

Random textured dynamical stimuli are generated as instances of few ran- 
dom variables defined by a generative model of synthesis. As a consequence, 
one may control the structural complexity of these synthetic textures by tun- 
ing the structure of the generative equations. In fact, the geometry of the 
visual world can be handled by using models to deal directly with the statis- 
tics of concurrent parameters, for instance edges or textures. For example, 
within each texture and/or edge class, low-dimensional models control the 
complexity of the stimuli using few meaningful inputs (regularity of edges, 
number of crossings, curvature of the texture flow, etc). This complexity 
parametrization gives access to both the local geometry of the image (for in- 
stance its local orientation, frequency, scale, granularity) and to more global 
integration properties (good continuation of edges, approximate periodic- 

ity). 

These models can be assembled, thus leading to a rich content that mixes 
edge and texture patterns. It is believed that this hierarchical structure of 
generative models maps on a one-to-one basis with the structure of the visual 
system, from the detection of moving contrasts in the retina through edges 
in the primary visual cortex up to higher order attributes like motion and 
shape. By designing such models with increasing scales of complexity, it 
shall therefore be possible to specifically target structures in the low-level 
visual system, such as respectively VI, V2, V4 and MT. The generative 
framework underlying Motion Clouds can make an important contribution 
to this long-term goal. 
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5.4 Exporting Motion Clouds to other sensory modalities 



Strong parallels have been drawn between visual and haptic processing for 
low level encoding of motion information, for instance [Pei et al., 2011]. 
Simple stimuli like drifting relief gratings, dynamic noise patterns or single 
elements such as lines and spots have already been used to investigate the 
properties of the somatosensory system. There is also a strong need to de- 
velop more sophisticated stimuli that can reproduce, in a controllable way, 
the statistics of natural somesthetic inputs. The theoretical framework de- 
scribed in the present article may also be used to design somesthetic inputs 
using mechanical actuators to excite the vibrissal array of rats' whiskers [Ja- 
cob et al., 2008]. This is another potential application of a set of stimuli 
bridging the gap between artificial and natural sensory input. 
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